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The renormalization group has proven to be a very pow- 
erful tool in physics for treating systems with many length 
scales. Here we show how it can be adapted to provide a 
new class of algorithms for discrete optimization. The heart 
of our method uses renormalization and recursion, and these 
processes are embedded in a genetic algorithm. The sys- 
tem is self-consistently optimized on all scales, leading to a 
high probability of finding the ground state configuration. To 
demonstrate the generality of such an approach, we perform 
tests on traveling salesman and spin glass problems. The re- 
sults show that our "genetic renormalization algorithm" is 
extremely powerful. 
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The study of disordered systems is an active and chal- 
lenging subject 0, and in many cases some of the most 
basic consequences of randomness remain subject to con- 
troversy. Given that numerical calculations of ground 
state properties can shed light on these issues, it is not 
surprising that more and more such calculations are be- 
ing performed . Our goal here is to introduce and test 
a new general purpose approach for finding ground states 
in disordered and frustrated systems. In this letter we il- 
lustrate its use on the traveling salesman problem and on 
the spin glass problem, showing that the ground states 
are found with a high probability. More generally, our 
novel approach should be very useful for many classes 
of discrete optimization problems and is thus of major 
interdisciplinary interest. 

Although it is often claimed that physical insight into 
disordered systems should lead to improved optimization 
algorithms, thus far, there has been very little substance 
to uphold this view. Aside from simulated annealing |^] 
and generalizations thereof Q, physics inspired ideas, 
ranging from replica symmetry breaking to energy land- 
scapes, have had little impact on practical algorithmic de- 
velopments in optimization. Nevertheless, several ideas 
from physics seemed promising, including renormaliza- 
tion and hierarchical constructions j^. Perhaps, the 
impact of these attempts has been minor because the re- 
sulting algorithms were not sufficiently powerful to be 
competitive with the state of the art. In our work, we 
have found that by carefully combining some of these 
ideas, namely renormalization and recursion, and by em- 
bedding them in a genetic algorithm approach, highly 
effective algorithms could be achieved. We thus believe 
that the essence of the renormalization group can be 



fruitfully applied to discrete optimization, and we expect 
the use of this type of algorithm to become widespread 
in the near future. 

Let us begin by sketching some of the standard ap- 
proaches for tackling hard discrete optimization prob- 
lems 0. For such problems, it is believed that there 
are no fast algorithms for finding the optimum, so much 
effort has concentrated on the goal of quickly obtaining 
"good" near-optimum solutions by heuristic means. One 
of the simplest heuristic algorithms is local search in 
which a few variables are changed at a time in the search 
for lower energy configurations. This heuristic and nu- 
merous generalizations thereof such as simulated anneal- 
ing optimize very effectively on small scales, that is on 
scales involving a small number of variables, but break- 
down for the larger scales that require the modification 
of many variables simultaneously. To tackle these large 
scales directly, genetic algorithms use a "crossing" 
procedure which takes two good configurations (parents) 
and generates a child which combines large parts of its 
parents. A population of configurations is evolved from 
one generation to the next using these crossings followed 
by a selection of the best children. Unfortunately, this 
approach does not work well in practice because it is 
very difficult to take two good parents and cross them 
to make a child which is as good as them. This is the 
major bottleneck of genetic algorithms and is responsi- 
ble for their limited use. For an optimization scheme 
to overcome these difficulties, it must explicitly treat all 
the scales in the problem simultaneously, the different 
scales being tightly coupled. To implement such a treat- 
ment, we rely on ideas from the renormalization group, 
the physicist's favorite tool for treating problems with 
many scales [Q. Our approach is based on embedding 
renormalization and recursion within a genetic algorithm, 
leading to what we call a "genetic renormalization algo- 
rithm" (GRA). To best understand the working of this 
approach, we now show how we have implemented it in 
two specific cases, the traveling salesman and the spin 
glass problems. 

The traveling salesman problem (TSP) — This routing 
problem is motivated by applications in the telecommuni- 
cation and transportation industries. Given A'^ cities and 
their mutual distances, one is to find the shortest closed 
path (tour) visiting each of the cities exactly once Q . In 
genetic algorithms, one takes two parents (good tours) 
from a population and finds the sub-paths they have in 
common. Then a child is built by reconnecting those 
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sub-paths, either randomly or by using parts belonging 
to the parents if possible; ultimately, these connections 
are not very good and lead to a child which is less fit 
than its parents. 

In our approach, instead of creating children as de- 
scribed, we engineer new configurations from sub-paths 
that are frequently shared in the population. In practice 
we pick k "parents" at random and determine their com- 
mon sub-paths: these form the patterns which we select 
before engineering the child. Then we wish to find the 
very best child which is compatible with these patterns. 
(This child should thus be at least as good as its best 
parent.) For this new problem, each sub-path is replaced 
by its two end cities and one bond which connects them; 
together with the cities which do not belong to any of 
the patterns, this defines a new, "renormalized" , TSP 
with fewer cities. Note that in this new TSP, we have 
removed all the cities inside the selected sub-paths, and 
have "frozen-in" bonds to connect their end-points; since 
we force these bonds to be in the tours, the renormal- 
ized problem is really a constrained TSP. The distance 
between two cities is the same as in the non-renormalized 
problem if they are not connected by a frozen bond, oth- 
erwise their distance is given by the length of the sub- 
path associated with the frozen bond. If this reduced 
problem is small enough, it can be solved by direct enu- 
meration. Otherwise, we "open up the Russian doll" and 
solve this renormalized problem recursively! Since each 
parent is compatible with the selected patterns, each of 
them corresponds to a legal tour for the renormalized 
problem. Thus we can use these tours in the first gener- 
ation of the recursive call of GRA: this way none of the 
information contained in the tours is lost. 

How does one choose the number of parents, fc? 
Clearly, the tour parts that are shared by all k parents 
decrease as k grows and the child becomes less and less 
constrained. Increasing k then has the effect of improv- 
ing the best possible child but also of making the corre- 
sponding search more difficult, so the choice of k results 
from a compromise. Genetic algorithms being biologi- 
cally motivated, the choice k = 2 may seem natural, but 
it need not be optimal and empirically we find it not to 
be. We do not claim to be the first to propose the use 
of more than two parents |lTl , but in previous proposals, 
the performance turned out to be lackluster. The reason 
is that they did not include the two essential ingredients: 
(i) a selection of patterns; (ii) a search for the best child 
consistent with the given patterns. 

A bird's eye view of our algorithm is as follows. We 
start with a population of M randomly generated tours; 
a simplified version of the Lin-Kcrnighan local search 
algorithm is applied to these tours which form the first 
generation. To obtain the next generation, we first pro- 
duce by recursion as many children as there are parents; 
then the local search improvement is applied to these 
children; finally, duplications among the children and 
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TABLE I. Tests on 5 instances from TSPLIB; the num- 
ber in the name of an instance represents its number of cities. 
Alk and Aqra are the relative differences between the length 
found by the corresponding algorithm and the optimum, tlk 
and TcRA are the CPU times in seconds to treat one instance, 
and Pgra represents the probability for GRA to find the op- 
timum. Data for the GRA have been averaged over 10 runs. 

children which present no improvement over their worst 
parent are eliminated. The next generation consists of 
the children remaining. The algorithm terminates when 
there is only one individual left. 

If the local search is taken as given (and we are not 
concerned here about its detailed implementation), our 
algorithm has two parameters, the number M of tours 
used in the population and k the number of parents of 
a child. In our numerical experiments for the TSP, we 
have chosen M ~ 50 for the top-most level where we 
treat the initial TSP instance, and M = 8 for the in- 
ner levels where renormalized instances are treated. Of 
course, other choices are possible, but we have not ex- 
plored them much. Let us just note that it is desirable 
to have M large enough to have plenty of diversity in the 
patterns which will be selected, thereby increasing one's 
chance of finding the ground state. However, there is a 
high computational cost for doing this, as each level of the 
recursion increases the CPU time multiplicatively. Thus 
the best strategy would probably be to have M decrease 
with the level of the recursion. Concerning the choice of 
the parameter k, a similar compromise has to be reached. 
The best quality solutions would be obtained with large 
k, but this would lead to many levels of recursion and 
thus to very long computation times. In practice, we in- 
crease k dynamically until of the current number of bonds 
to be found, at least a threshold fraction of 10 % remains 
unfrozen at this step. This ensures that the renormaliza- 
tion does not reduce the problem size too dramatically, 
allowing good solutions to be found. For the instances 
we considered, nearly all values of k were between 2 and 
6, with 5 being the most probable value. 

How well does the method work? For the TSP, it is 
standard practice to test heuristics on problems from the 
TSPLIB library |l^]. We have tested our algorithm on 
5 problems of that library for which the exact optima 
arc known. As can be seen in Table ^, the improvement 
over the local search is impressive (we use a DEC-a-SOO 
work-station to treat these instances). Still better re- 
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suits could be obtained by improving the local search 
part. Several other groups (see Q and chapter 7 in 
have fine-tuned their Lin-Kernighan algorithm both for 
speed and for quality. In spite of the fact that our version 
of LK is far less effective, we obtain results comparable 
to their's. We believe that this excellent performance is 
possible because GRA incorporates the essential ingredi- 
ents which allow the optimization to be effective on all 
scales. To give evidence of this, we now show that GRA 
is also extremely effective on a very different problem. 

The spin glass problem (SGP) — Spin glasses have 
long been a subject of intense study in statistical physics. 
One of the simplest spin glass models is that of Edwards 
and Anderson in which Ising spins {Si = ±1) are 
placed on a lattice and the interactions are between near- 
est neighbors only. The corresponding Hamiltonian is 

H = - ^ JijSiSj 

where the are quenched random variables with zero 
mean. For our purpose here, the spin glass problem con- 
sists in finding the spin values which minimize H . To 
find this minimum with a genetic algorithm approach, we 
need the "building blocks" of good configurations. This 
time, simply looking at the variables (spin orientations) 
which are shared between parents is not effective since 
the energy is unchanged when all the spins are fiipped. 
Instead, we consider correlations among the spins. The 
simplest correlation, whether two neighboring spins are 
parallel or anti-parallel, will suit our needs just fine. Con- 
sider first any set of spins; if the relative orientations of 
these spins are the same for all k parents, we say that 
they form a "pattern" ; the values of the spins in that 
pattern are then frozen up to an overall sign change. 
Now we sharpen a bit this notion of a pattern: we re- 
quire the set of spins to be both maximal and connected, 
and we call such a set a block. (Note that the patterns 
introduced for the TSP also had these two properties.) 
We can associate a fictitious or "blocked" spin to each 
such block to describe its state. Flipping this blocked 
spin corresponds to flipping all the spins in the block, 
a transformation which maintains the pattern (i.e., the 
relative orientations of the spins in the block). 

With these definitions, it is not difficult to see that each 
spin belongs to exactly one block (which may be of size 
1 though). Furthermore, the configurations compatible 
with the patterns shared by the k parents are obtained 
by specifying orientations for each blocked spin; this pro- 
cedure defines the space spanned by all possible children. 
Not surprisingly, the energy function (Hamiltonian) in 
this space is (up to an additive amount) quadratic in the 
blocked spin values, so finding the best possible child is 
again a spin glass problem, but with fewer spins! Because 
of this property, the renormalization/recursion approach 
can be used very effectively, similarly to what happened 
in the case of the TSP. 



To find the (renormalized) coupling between two 
blocked spins, proceed as follows. First put the two spins 
in the up state; unblock each spin so that one has all the 
spins of the initial system they are composed of. The 
coupling between the two blocked spins is obtained by 
summing the JijSiSj where Si belongs to the set defin- 
ing the first spin and Sj to that of the second. (Here, 
Si denotes the value (±1) of the spin i when its (unique) 
blocked spin is up. Note also that to obtain the total en- 
ergy of a blocked configuration, one also has to take into 
account the energy inside each blocked spin.) Finally, 
a straightforward calculation shows that this formalism 
carries over in the presence of an arbitrary magnetic field 
also. 

Given the construction of blocks and a local search 
routine (we use a version of the Kernighan-Lin |l5|| al- 
gorithm (KL)), the GRA proceeds as before. For the 
number of parents fc, we follow the spirit of the proce- 
dure used for the TSP: we increase k dynamically until 
the size of the renormalized problem is at least 7.5 % 
that of the current problem. For this choice, fc = 5 is the 
most frequent value, and we find that the distribution of 
k is rather narrow. (Clearly, when k increases, the size 
of the renormalized problem increases rather rapidly.) 

Testing the algorithm is not easy as there is no library 
of solved SGP instances. Fortunately, when the grids 
are two-dimensional, there are very effective exact meth- 
ods for finding the optimum p^ . We thus performed 
a first type of test where we ran our GRA on ten in- 
stances corresponding to toroidal grids of size 50 x 50 
with Jij = ±1. (The exact solutions were provided by J. 
Mitchell.) For these runs we set M = 5 + 0.2N for each 
level {N being the number of spins at that level). The 
algorithm halted on the 6th, 7th, or 8th generation, and 
in all cases found the exact optimum. Furthermore, we 
measured the mean excess above the optimum for each 
generation. The first generation corresponds to simply 
using the local search, and had a mean excess above the 
optimum of 12 %. Thereafter, the mean excess energy 
decreased by a factor of 2 to 3 at each generation, until 
it hit 0. (Furthermore, instance to instance fiuctuations 
were small.) In terms of computation time, our local 
search took on average 0.02 seconds on these instances, 
and the average time taken by GRA was 16,000 seconds. 
This performance is competitive with that of the state 
of the art heuristic algorithm jl^ developed specifically 
for the SGP. Since this same property was found to be 
satisfied in the case of the TSP, there is good evidence 
that GRA is a general purpose and effective optimization 
strategy. 

As a second kind of check on our method, we consid- 
ered 3-dimcnsional grids of size L x L x L with Gaussian 
Jij 's for which exact methods are not so effective. These 
kinds of grids are of direct physical relevance jij . Since 
we did not know the exact optima, our analysis relied 
on self-consistency: we considered we had found the op- 
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TABLE II. Tests on L x L x L SGP instances. Akl and 
Agha are the relative differences between the energy found 
by the corresponding algorithm and the optimum, tkl and 
TGRA are the CPU times in seconds to treat one instance. 
TGRA and Agra are results for M = 15. Pgra' represents 
the probability for GRA to find the optimum when M — N 
at the top level and M = 5 + 0.2N for inner levels. 



rithms to become widely used in the near future, both in 
fundamental jl^ and apphed research. 
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timum when the most powerful version of our algorithm 
(large M) output the same configuration with a probabil- 
ity above 90 %. Measuring this probability requires per- 
forming many runs, but once one has this putative opti- 
mum, one can measure the performance of the algorithm 
in a quantitative way. To achieve the precision required 
we set M = N for the top level and Af = 5 + for 
inner levels; then the probabilities to find the optimum 
are as given in the last column of Table |l[ We also give 
in this table the performance of the GRA with A/ = 15 
for all the levels; for this choice of M , the algorithm is 
one to two orders of magnitude slower than KL, but leads 
to mean energy excesses that are 10 to 100 times smaller! 
Overall, the quality of the solutions is excellent even with 
a relatively small Af , and we see that up to 1000 spins, 
GRA is able to find the optimum with a high probability 
provided M is large enough. 

Discussion — For both the traveling salesman and 
the spin glass problems, our genetic renormalization al- 
gorithm finds solutions whose quality is far better than 
those found by local search. In a more general context, 
our approach may be considered as a systematic way to 
improve upon state of the art local searches. A key to 
this good performance is the treatment of multiple scales 
by renormalization and recursion. The use of a popula- 
tion of configurations then allows us to self-consistently 
optimize the problem on all scales. Just as in divide and 
conquer strategies, combinatorial complexity is handled 
by considering a hierarchy of problems. But contrary 
to those strategies, information in our system flows both 
from small scales to large scales and back. Clearly such 
a flow is necessary as a choice or selection of a pattern at 
small scales may be validated only at much larger scales. 

In this work, we put such principles together in a sim- 
ple manner; nevertheless, the genetic renormalization al- 
gorithm we obtained compared very well with the state 
of the art heuristics specially developed for the problems 
investigated. Improvements in the population dynamics 
and in the local search can make our approach even more 
powerful. We thus expect genetic renormalization algo- 
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